Parallel Replacement in Finite State Calculus 



Andre Kempe and Lauri Karttunen 

Rank Xerox Research Centre - Grenoble Laboratory 
6, chemin de Maupertuis - 38240 Meylan - France 

{kempe , karttunen}@xerox . f r http : //www . rxrc . xerox . com/ grenoble/mltt/home . html 



On 
On 

< 

(N 



> 

o 
o 
t> 
o 

I 



X 
S3 



Abstract 

This paper extends the calculus of regular ex- 
pressions with new types of replacement ex- 
pressions that enhance the expressiveness of 
the simple replace operator defined in Kart- 
tunen (1995). Parallel replacement allows 
multiple replacements to apply simultaneously 
to the same input without interfering with 
each other. We also allow a replacement to 
be constrained by any number of alternative 
contexts. With these enhancements, the gen- 
eral replacement expressions are more versa- 
tile than two-level rules for the description of 
complex morphological alternations. 

1 Introduction 

A replacement expression specifies that a given 
symbol or a sequence of symbols should be replaced 
by another one in a certain context or contexts. 
Phonological rewrite-rules (Kaplan and Kay, 1994), 
two- level rules (Koskenniemi 1983), syntactic dis- 
ambiguation rules (Karlsson et al 1994, Kosken- 
niemi, Tapanainen, and Voutilainen 1992), and 
part-of-speech assignment rules (Brill 1992, Roche 
and Schabes 1995) are examples of replacement in 
context of finite-state grammars. 

Kaplan and Kay (1994) describe a general 
method representing a replacement procedure as 
finite-state transduction. Karttunen (1995) takes a 
somewhat simpler approach by introducing to the 
calculus of regular expression a replacement opera- 
tor that is defined just in terms of the other regular 
expression operators. We follow here the latter ap- 
proach. 

In the regular expression calculus, the replace- 
ment operator, ->, is similar to crossproduct, in 
that a replacement expression describes a rela- 
tion between two simple regular languages. Con- 
sequently, regular expressions can be conveniently 
combined with other kinds of coperations, such as 
composition and union to form complex expres- 
sions. 

A replacement relation consists of pairs of strings 
that are related to one another in the manner 
sketched below: 

x u^ y uf z upper string W 
x y if z lower string 



We use u^ and uf to represent instances of Ui (with 

i 6 [!,«]) and r? and if to represent instances of Li. 
The upper string contains zero or more instances of 
Ui, possibly interspersed with other material (de- 
noted here by x, y, and z). In the corresponding 



lower string the sections corresponding to Ui are in- 
stances of Li , and the intervening material remains 
the same (Karttunen, 1995). 

The -> operator makes the replacement obliga- 
tory, (->) makes it optional. For the sake of com- 
pleteness, we also define the inverse operators, <- 
and (<-), and the bidirectional variants, <-> and 
(<->). 

We have incorporated the new replacement ex- 
pressions into our implementation of the finite- 
state calculus (Kempe and Karttunen, 1995). 
Thus, we can construct transducers directly from 
replacement expressions as part of the general cal- 
culus, without invoking any special rule compiler. 

1.1 Simple regular expressions 

The table below describes the types of regular ex- 
pressions and special symbols that are used to de- 
fine the replacement operators. 



(A) 


option, [ A I ] 
Kleene star 


[2] 


A* 




A+ 


Kleene plus 




A/B 


ignore (A possibly interspersed 
strings from B) 


with 


~A 


complement (negation) 




$A 


contains (at least one) A 




A B 


concatenation 




A I B 


union 




A & B 


intersection 




A - B 


relative complement (minus) 




A .x. B 


crossproduct (Cartesian product) 




A .o. B 


composition 




Oor [ ] 


epsilon (the empty string) 




[. .] 


affects empty string replacement (sec 


2.2) 


7 


any symbol 




?* 


the universal ("sigma-star") lane 


;uage 




(contains all possible strings of any length 




including the empty string) 




.#. 


string beginning or end (sec. 2.1) 





Note that expressions that contain the cross- 
product ( . x . ) or the composition ( . o . ) opera- 
tor, describe regular relations rather than regular 
languages. A regular relation is a mapping from 
one regular language to another one. Regular lan- 
guages correspond to simple finite-state automata; 
regular relations are modelled by finite-state trans- 
ducers. 

In the relation A .x. B, we call the first mem- 
ber, A, the upper language and the second mem- 
ber, B, the lower language. This choice of words 
is motivated by the linguistic tradition of writ- 
ing the result of a rule application underneath 
the original form. In a cascade of compositions, 
Rl . o . R2 ... . o . Rn, which models a linguistic 
derivation by rewrite-rules, the upper side of the 
first relation, Rl, contains the "underlying lexical 



form" , while the lower side of the last relation, Rn, 
contains the resulting "surface form" . 

We recognize two kinds of symbols: simple sym- 
bols (a, b, c, etc.) and fst pairs (a:b, y:z, etc.). 
An fst pair a : b can be thought of as the crossprod- 
uct of a and b, the minimal relation consisting of a 
(the upper symbol) and b (the lower symbol). 

2 Parallel Replacement 

Conditional parallel replacement denotes a relation 
which maps a set of n expressions Ui (i £ [1,^]) hi 
the upper language into a set of corresponding n 
expressions Li in the lower language if, and only if, 
they occur between a left and a right context (k, 

n). 

{ Ui -> Li I I h _ n } , ... [3] 

... , { U„ ~> L n | | l n _ r„ } 

Unconditional parallel replacement denotes a 
similar relation where the replacement is not con- 
straint by contexts. 

Conditional parallel replacement corresponds to 
what Kaplan and Kay (1994) call "batch rules" 
where a set of rules (replacements) is collected to- 
gether in a batch and performed in parallel, at the 
same time, in a way that all of them work on the 
same input, i.e. not one applies to the output of 
another replacement. 

2.1 Examples 

Regular expressions based on [3] can be abbrevi- 
ated if some of the UPPER-LOWER pairs, and/or 
some of the LEFT- RIGHT pairs, are equivalent. The 
complex expression: 

{ a -> b , b -> c I I x . y } ; [4] 

which contains multiple replacement in one left and 
right context, can be written in a more elementary 
way as two parallel replacements: 

{ a -> b || x . y },{ b -> c || x . y }; [5] 




Figure 1: Transducer encoding [4] and [5] (Every arc 
with more than one label actually stands for a set of 
arcs with one label each.) 

Figure 1 shows the state diagram of a trans- 
ducer resulting from [4] or [5]. The transducer 
maps the string xaxayby to xaxbyby following the 
path 0-1-2-1-3-0-0-0 and the string xbybyxa to 
xcybyxa following the path 0-1-3-0-0-0-1-2. 

The complex expression 



{a->b,b->c||x_y,v_w}, [6] 
{ a -> c II p_q} ; 

contains five single parallel replacements: 

{ a -> b I I x . y } , [7] 

{ a -> b || v _ w } , 

{ b -> c || x _ y } , 

{ b -> c || v_w} , 

{ a -> c || p _ q } ; 

Contexts can be unspecified as in 

{ a -> b I I x . y , v . , . w } ; [8] 

where a is replaced by b only when occuring be- 
tween x and y, or after v, or before w. 

An unspecified context is equivalent to ?*, the 
universal (sigma-star) language. Similarly, a spec- 
ified context, such as x _ y, is actually interpreted 
as?*x_y?*, that is, implicitly extending the 
context to infinity on both sides of the replacement. 
This is a useful convention, but we also need to be 
able to refer explicitly to the beginning or the end 
of a string. For this purpose, we introduce a special 
symbol, .#. (Kaplan and Kay, 1994, p. 349). 

In the example 

{ a -> b II .#._ , v . ? ? .#.} ; [9] 

a is replaced by b only when it is at the beginning 
of a string or between v and the two final symbols 
of a string 1 . 

2.2 Replacement of the Empty String 

The language described by the UPPER part of a 
replacement expression 2 

UPPER -> LOWER I I LEFT _ RIGHT [10] 

can contain the empty string e. In this case, every 
string that is in the upper-side language of the re- 
lation, is mapped to an infinite set of strings in the 
lower-side language as the upper-side string can be 
considered concatenation of empty and non- 
empty substrings, with e at any position and in 
any number. E.g. 

a* -> x I I . ; [11] 

maps the string bb to the infinite set of strings bb, 
xbb, xbxb, xbxbx, xxbb, etc., since the language 
described by a* contains e, and the string bb can 
be considered as a result of any one of the concate- 
nations b^b, e^b^b, e^b^e^b, e^b^e^b^e, 
e^e^b^b, etc. 

For many practical purposes it is convenient to 
construct a version of empty-string replacement 
that allows only one application between any two 
adjacent symbols (Karttuncn, 1995). In order not 
to confuse the notation by a non-standard interpre- 
tation of the notion of empty string, we introduce a 
special pair of brackets, [. .] , placed around the 



iNote that .#. denotes the beginning or the end of a 
string depending on whether it occurs in the left or the right 
context. 

2 We describe this topic only for uni-dircctional replace- 
ment from the upper to the lower side of a regular relation, 
but analogous statements can be made for all other types of 
replacement mentioned in section 3. 



upper side of a replacement expression that presup- 
poses a strict alternation of empty substrings and 
non-empty substrings of exactly one symbol: 



e x e y e z e 



[12] 



In applying this to the above example, we obtain 
[. a* .] -> x I I _ ; [13] 

that maps the string bb only to xbxbx since bb is 
here considered exclusively as a result of the con- 
catenation c^b^e^b^e. 

If contexts are specified (in opposition to the 
above example) then they are taken into account. 

2.3 The Algorithm 
2.3.1 Auxiliary Brackets 

The replacement of one substring by another one 
inside a context, requires the introduction of aux- 
iliary symbols (e.g. brackets). Kaplan and Kay 
(1994) motivate this step. 
If we would use an expression like 



U LUi .x. Lil n 



[14] 



to map a particular Ui (i 6 [l,n]) to Li when oc- 
curing between a left and a right context, k and n, 
then every U and r, would map substring adjacent 
to a t . 

However, this approach is impossible for the fol- 
lowing reason (Kaplan and Kay, 1994): In an ex- 
ample like 



{a->b || x_x} 



[15] 



where we expect xaxax to be replaced by xbxbx, 
the middle x serves as a context for both a's. A 
relation described by [14] could not accomplish this. 
The middle x would be mapped either by an ?-j or 
by an k but not by both at the same time. That is 
why only one a could be replaced and we would get 
two alternative lower strings, xbxax and xaxbx. 

Therefore, we have to use the contexts, U and rj, 
without mapping them. For this purpose we intro- 
duce auxiliary brackets <j after every left context 
li and >i before every right context rj. The re- 
placement maps those brackets without looking at 
the actual contexts. 

We need separate brackets for empty and non- 
empty UPPER. If we used the same bracket for both 
this would mean an overlap of the substrings to 
replace in an example like x>i<ia>i. Here we 
might have to replace >i<i and <ia>i where <i 
is part of both substrings. Because of this overlap, 
we could not replace both substrings in parallel, i.e. 
at the same time. To make the two replacements 
sequentially is also impossible in either order, for 
reasons in detail explained in (Kempe and Kart- 
tuncn, 1995). 

A regular relation describing replacement in con- 
text (and a transducer that represents it) , is defined 
by the composition of a set of "simpler" auxiliary 
relations. Context brackets occur only in interme- 
diate relations and are not present in the final re- 
sult. 



2.3.2 Preparatory Steps 

Before the replacement we make the following three 
transformations : 

(1) Complex regular expressions like [4] are 
transformed into elementary ones like [5] , where ev- 
ery single replacement consists of only one UPPER, 
one LOWER, one LEFT and one RIGHT expression. 
E.g. 

{ [.(a).] -> b I I x . y } , 

{ [ ] -> c , e -> f I I v . w } ; [16] 

would be expanded to 



{ [.(a).] -> b I I x . y } , 
{ [ ] -> c I I v . w } , 
{ e -> f || v _ w } ; 



[17] 



(2) Since we have to use different types of brack- 
ets for the replacement of empty and non-empty 
UPPER (cf. 2.3.1), we split the set of parallel re- 
placements into two groups, one containing only 
replacements with empty UPPER and the other one 
only with non-empty UPPER. If an UPPER contains 
the empty string but is not identical with it, the 
replacement will be added to both groups but with 
a different UPPER. E.g. [17] would be split into 

{a->b II x_y} , 

{ e -> f I I v _ w } ; [18] 

the group of non-empty UPPER and 

{ [. .] -> b I I x . y } , 

{ [ ] -> c I I v . w } ; [19] 

the group of empty UPPER. 

(3) All empty UPPER of type [ ] are trans- 
formed into type [ . . ] and the corresponding 
LOWER are replaced by their Klccnc star function. 
E.g. [19] would be transformed into 



{ [. .] 
{ [. .] 



-> b II x . 
-> c* || v 



y V 

- w } ; 



[20] 



The following algorithm of conditional parallel 
replacement will consider all empty UPPER as being 
of type [ . . ] , i.e. as not being adjacent to another 
empty string. 

2.3.3 The Replacement itself 

Apart from the previously explained symbols, we 
will make use of the following symbols in the next 
regular expressions: 

<ie <mE ], union of all left brackets 
for empty UPPER. 
>ie |...| >mE ], union of all right brackets 
'or empty UPPER. 

<allE | >allE ] 

<i |-.-| <n ], union of all left brackets for 
non-empty UPPER. 

[ >i |...| >n ], union of all right brackets for 
non-empty UPPER. 

<allNE | >allNE ] 
<allE <allNE 
>allE >allNE 
<all | >all ] 

Ignore-inside operator. 
Example: abc./x = [abc/x] - [x ?*] - [?* x], 
inside the string abc, i.e. between a and b 
and between b and c, all x will be ignored 
any number of times. 



<allE 

>allE 

XallE 
<allNE 

>allNE 

XallNE 
<all 
>all 
><all 

J. 



We compose the conditional parallel replacement 
of the six auxiliary relations described by Kaplan 
and Kay (1994) and Karttunen (1995) which are: 

(1) InsertBrackets I 22 ] 

(2) ConstrainBrackets 

(3) LeftContext 

(4) RightContext 

(5) Replace 

(6) RemoveBrackets 

The composition of these relations in the above 
order, defines the upward-oriented replacement. 
The resulting transducer maps UPPER inside an in- 
put string to LOWER, when UPPER is between LEFT 
and RIGHT in the input context, leaving everything 
else unchanged. Other variants of the replacement 
operator will be defined later. 

For every single replacement { Ui -> Li II k 
_ Ti } we introduce a separate pair of brackets <j 
and >j with i G [lE...mE] if UPPER is identical 
with the empty string and i G [l...n] if UPPER does 
not contain the empty string. A left bracket <j 
indicates the end of a complete left context. A right 
bracket >j marks the beginning of a complete right 
context. 

We define the component relations in the fol- 
lowing way. Note that UPPER, LOWER, LEFT and 
RIGHT (Ui, Li, k and n) stand for regular expres- 
sions of any complexity but restricted to denote 
regular languages. Consequently, they are repre- 
sented by networks that contain no fst pairs. 

(1) InsertBrackets 

[ ] <" Xall [ 23 1 

The relation inserts instances of all brackets on 
the lower side (everywhere and in any number and 
order). 

(2) ConstrainBrackets 

$[ >allE L >allNE ] ] t 24 l 
& ~$[ <allE 1 >all ] 1 
& ~$[ <allNE [ <allE I > all 1 1 

The language does not apply to single brackets 
but to their types and allows them to be only in 
the following order: 

>allNE* >allE* <allE* <allNE* [25] 

The composition of the steps (1) and (2) invokes 
this constraint, which is necessary for the following 
reasons: 

If we allowed sequences like <3 U3 <i>3 U\ >i 
we would have an overlap of the two substrings 
<3 Uz >3 and <i U\ >i which have to be replaced. 
Here, either U\ or U 3 could be replaced but not 
both at the same time. 

If we permitted sequences like >ie<2<ie U2 >2 
we would also have an overlap of the two re- 
placements which means we could either replace 
<2 U2 >2 or >ie<ie but not both. 

(3) LeftContext 

Xi k ... k \ n [26] 

for all i e [lE...mE, l...n] , A; = 

"$[ "[UXail] (><„;;-<»)* <i ] 
k ~$[ UiJXalll (><aii-<i)* ~<i 1 



The constraint forces every instance of a left 
bracket <j to be immediately preceded by the cor- 
responding left context li and every instance of U to 
be immediately followed by < i; ignoring all brack- 
ets that are different from <j inbetween, and all 
brackets (<j included) inside li (./). We separately 
make the constraints \ for every <j and li and then 
intersect them in order to get the constraint for all 
left brackets and contexts. 

(4) RightContext 

pi k ... k p n [27] 

for all i G [\E...mE, l...n] , pi = 

~$[ >i (.Xall - >i)* 'InJXalll ] 
k ~$[ ~> t (.Xaii - >*)* [ri7x.ii] ] 

The constraint relates instances of right brackets 
>i and of right contexts rj, and is the mirror im- 
age of step (3). We derive it from the left context 
constraint by reversing every right context r%, be- 
fore making the single constraints A» (not pi) and 
reversing again the result after having intersected 
all Aj. 

(5) Replace 

[ M TZ ] * AT [28] 

The relation maps every bracketed UPPER, 
<i Ui >i for non-empty UPPER and >i<i for empty 
UPPER, to the corresponding bracketed LOWER, 
<i Li >i, leaving everything else unchanged. 

The term AT in [28] means a string that does not 
contain any bracketed UPPER: 

Af = Uie k. . .k AfmE k Ui St... .k Af n I 29 ! 

A particular bracketed empty UPPER >i<i is ex- 
cluded from the corresponding A/i (i E [lE,mE]) 

by 

Mi = "$[>» [XallE - >i - <J* <i] [30] 

and a bracketed non-empty UPPER <j Ui >i is ex- 
cluded from the corresponding Mi (i £ [1, n]) by 

Mi = KallNE ~ <J* I 31 ! 

UlJXall l>allNE ~ >»] * >»] 

The term TZ in expression [28] abbreviates a re- 
lation that maps any bracketed UPPER to the cor- 
responding bracketed LOWER. It is the union of all 
single IZi relations mapping all occurrences of one 
Ui (empty and non-empty) to the corresponding 
Li- 

TZ = TZlE I ... I TZmE I Til I ... I Tl n I 32 ! 

The replacement TZi of non-empty UPPER 

Ui (i G [l,ft]) is performed by: 

<i [ [UiJx a ia.x..[LiJx al a ] >i [33] 

To illustrate this: Suppose we have a set of re- 
placements containing among others 

a -> b I I x _ y ; [34] 

This particular replacement is done by mapping in- 
side an input string every substring that looks like 
(underlined part) 

...X >2>1>1B<1B<2 <ia>l >2>lE<lB<l<2y. • • 

using the brackets <i and >i to a substring (un- 
derlined part) 



<ib>i >2>1B<1B<1<2 

The replacement TZi of empty UPPER U 

(i E [lE,mE]) is performed by: 

[ O.X.llXallE ~ <il I KallNEll I* ^ 
[>j.X.<j] [ O.X. ILiJXau]] [<j.X.>j] 
[ O.X.[[>< aHB - >il I [>a«JVB]] ]* 

In the following example we replace the empty 
U2E by L 2 e- Suppose we have in total one replace- 
ment of non-empty UPPER and two of empty UP- 
PER, one of which is 

[. .] "> b I I x _ y ; [38] 

This replacement is done by mapping inside a 
string every substring that looks like (underlined 
part) 

...x >i>ie >2E <2E <ie<i y... [39] 

using the brackets >2E<2E into a substring (un- 
derlined part) 

■ ■ -X >i>ie [>2E I >1E I <1E I <l]* [40] 

<2E*»2E 

[>1 I >1E I <1E I <2e]* <1E<1 y- • • 

The occurrence of exactly one bracket pair 
and <iE between a left and a right context, actually 
corresponds to the definition of a (single) empty 
string expressed by [. .] (cf. sec. 2.2). 

The brackets [> 2 b | >ie I <ie I <il and 
[>i | >ie I <ie I <2e1 in [40] are inserted on the 
lower side any number of times (including zero), i.e. 
they exist optionally, which makes them present if 
checking for the left or right context requires them, 
and absent if they are not allowed in this place. 
This set of brackets does not contain those ones 
used for the replacement, >%<%, because if we later 
check for them we do not want this check to be al- 
ways satisfied but only when the specified contexts 
are present, in order to be able to confirm or to 
cancel the replacement a posteriori. 

This set of optionally inserted brackets equally 
does not contain those which potentially could be 
used for the replacement of adjacent non-empty 
strings, i.e. > a iiNE on the left and < a iiNE on the 
right side of the expression. Otherwise, checking 
later for the legitimacy of the adjacent replace- 
ments would no longer be possible. 

(6) RemoveBrackets 

Xall ~> [ 1 [41] 

The relation eliminates from the lower-side lan- 
guage all brackets that appear on the upper side. 

3 Variants of Replacement 

3.1 Application of context constraints 

We distinguish four ways how context can constrain 
the replacement. The difference between them is 
where the left and the right contexts are expected, 
on the upper or on the lower side of the relation, i.e. 
LEFT and RIGHT contexts can be checked before or 
after the replacement. 

We obtain these four different applications of 
context constraints (denoted by I I , //, \\ and 



\/) by varying the order of the auxiliary rela- 
tions (steps (3) to (5)) described in section 2.3.3 
(cf. [22]) : 

(a) Upward-oriented 

{ Ui -> Li I I h - n } , . . . [42] 
, { U„ -> L n I I l n _ r n } 
. . . Lef tContext .o. RightContext .o. Replace... 

(b) Right-oriented 

{ Ui -> Li // h . n } , ... [43] 
...RightContext .o. Replace .o. Lef tContext .. . 

(c) Left-oriented 

{ Ui -> Li \\ h - n } , ... [44] 
.. .Lef tContext .o. Replace .o. RightContext... 

(d) Downward-oriented 

{ Ui -> Li \/ h - n } , ... [45] 
...Replace .o. LeftContext .o. RightContext... 

The versions (a) to (c) roughly correspond to 
the three alternative interpretations of phonolog- 
ical rewrite rules discussed in Kaplan and Kay 
(1994). The upward-oriented version corresponds 
to the simultaneous rule application; the right- and 
left-oriented versions can model rightward or left- 
ward iterating processes, such as vowel harmony 
and assimilation. 

In the downward-oriented replacement the oper- 
ation is constrained by the lower (left and right) 
context. Here the Ui get mapped to the corre- 
sponding Li just in case they end up between li 
and Ti in the output string. 

3.2 Inverse, bidirectional and optional 
replacement 

Replacement as described above, ->, maps every 
Ui on the upper side unambiguously to the corre- 
sponding Li on the lower side but not vice versa. 
A Li on the lower side gets mapped to L, or Ui on 
the upper side. 

The inverse replacement, <-, maps unambigu- 
ously from the lower to the upper side only. The 
bidirectional replacement, <->, is unambiguous in 
both directions. 

Replacements of all of these three types (direc- 
tions) can be optional, (->) (<-) (<->), i.e. they 
are either made or not. We define such a relation 
by changing M (the part not containing any brack- 
eted UPPER) in expression [28] into ?* that accepts 
every substring: 

[ ?* TZI* ?* [46] 

Here an U is either mapped by the corresponding 
TZi contained in TZ (cf. [32]) and therefore replaced 
by Li, or it is mapped by ?* and not replaced. 

4 A Practical Application 

In this section we illustrate the usefulness of the 
replace operator using a practical example. 

We show how a lexicon of French verbs ending in 
-ir, inflected in the present tense subjunctive mood, 
can be derived from a lexicon containing the corre- 
sponding present indicative forms. We assume here 
that irregular verbs are encoded separately. 

It is often proposed that the present subjunctive 
of -ir verbs be derived, for the most basic case, from 



a stem in -iss- (e.g.: fimr/fimss) rather than from 
a more general root (e.g.: fm(i)) because once this 
stem is assumed, the subjunctive ending itself be- 
comes completely regular: 



(that I finish) 
que je ftniss-e 
que tu finiss-es 



(that I run) 
que je cour-e 
que tu cour-es 



que Us finiss-ent que Us cour-ent 

The algorithm we propose here, is straightfor- 
ward: We first derive the present subjunctive stem 
from the third person plural present indicative 
(e.g.: finzss, cour), then append the suffix corre- 
sponding to the given person and number. 

The first step can be described as follows: 

define LETTER : 

[47] 

alblcldl....; 
define TAG : 

[48] 

SubjPl . . . I SGI . . . I P3 I . . . I Verb I . . . ; 
define StemRegular : 

[49] 

[ [. .] <-> IndP PL P3 Verb I I LETTER _ TAG ] 
.o. 

[ Lexlnd TAG+ ] 
.o. 

[ e n t <-> SUFF I I TAG ] ; 

The first transducer in [49] inserts the tags of the 
third person plural present indicative between the 
word and the tags of the actually required subjunc- 
tive form. The second transducer in [49] which is an 
indicative lexicon of -ir verbs, concatenated with a 
sequence of at least one tag, provides the indica- 
tive form and keeps the initial subjunctive tags. 
The last transducer in [49] replaces the suffix -ent 
by the symbol SUFF. E.g.: 

f inir Sub j P_PL_P2_Verb 

finir__IndP_PL_P3_Verb__SubjP_PL_P2_Verb 

f inissent SubjP_PL_P2_Verb 

finiss_SUFF SubjP_PL_P2_Verb 

To append the appropriate suffix to the subjunc- 
tive stem, we use the following transducer which 
maps the symbol SUFF to a suffix and deletes all 
tags: 

[50] 



TAG* SG [P1IP3] }, 
TAG* SG P2 }, 
TAG* PL PI }, 
TAG* PL P2 }, 
TAG* PL P3 } ] 



define Suffix : 
[ { SUFF -> e 
{ SUFF -> e s 
{ SUFF -> i o n £ 
{ SUFF -> i e z 
{ SUFF -> e n t 
.o. 

[ TAG -> [ ] ] ; 

The complete generation of subjunctive forms can 
be described by the composition: 

define LexSubjP : 

[51] 

StemRegular .o. Suffix ; 

The resulting (single) transducer LexSubjP rep- 
resents a lexicon of present subjunctive forms of 
French verbs ending in -ir. It maps the infinitive of 
those verbs followed by a sequence of subjunctive 
tags, to the corresponding inflected surface form 
and vice versa. 

All intermediate transducers mentioned in this 
section will contribute to this final transducer but 
will themselves disappear. 

The regular expressions in this section could also 



be written in the two-level formalism (Koskcn- 
niemi, 1983). However, some of them can be ex- 
pressed more conveniently in the above way, espe- 
cially when the replace operator is used. 
E.g., the first line of [49], written above as: 

[. .] <-> IndP PL P3 Verb I I LETTER _ TAG [52] 

would have to be expressed in the two-level formal- 
ism by four rules: 

0:IndP <=> LETTER _ ( :PL) ( : P3) ( : Verb) TAG; [53] 

0:PL <=> LETTER ( : IndP) _ (:P3)(:Verb) TAG; 

0:P3 <=> LETTER (:IndP)(:PL) _ (:Verb) TAG; 

0:Verb <=> LETTER ( : IndP) ( : PL) ( :P3) _ TAG; 

Here, the difficulty comes not only from the large 
number of rules we would have to write in the above 
example, but also from the fact that writing one of 
these rules requires to have in mind all the others, 
to avoid inconsistencies between them. 
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